6 Types of Machine Learning Must Know in Data Science
Published:
The speed and complexity of uncontrolled data requires special tools that can be used to handle the data but are still friendly for beginners. One of these tools is Machine Learning and Data Science.
Data Science is not a field of science that stands alone, but a combination of various fields of science that both play an important role. Some of them are mathematics, statistics, machine learning, programming languages, business understanding and other supporting soft skills.
Machine Learning is a branch of science known in Data Science. Machine Learning methods are widely used by Data Scientists to extract valuable information hidden in Big Data. With the help of Machine Learning, data analysis work becomes easier because there is no need to calculate manually.
One of the benefits of using Machine Learning in Data Science is to carry out the data training process according to what he has learned and find patterns in the data to make predictions. Machine Learning is able to learn data by itself and does not need to be reprogrammed periodically.
There are at least three Machine Learning techniques or methods that are widely known in Data Science. There are Supervised Learning, Unsupervised Learning and Reinforcement Learning. All three have different characteristics and uses.
Currently, there are many machine learning methods that can be used by data scientists. As a candidate for a reliable Data Scientist, you should understand the Machine Learning method that you must understand, right?
Well, let’s see this article until the end to understand what Machine Learning methods must be mastered by a Data Scientist.
- Decision Tree Decision Tree is an algorithm that is very easy to understand in object classification. This algorithm is one of the Supervised Learning algorithms. The Decision Tree divides the data into subsets based on the input variables.
This algorithm is a type of flow chart that helps in the decision-making process. This Decision Tree becomes a decision support tool that uses graphs or models such as trees.
Basically, a Decision Tree starts with a single node or nodes. Then, the node branches to represent the available options. Furthermore, each of these branches will have new branches. Therefore, this method is called “tree” because its shape resembles a tree with many branches.
Quoting from Venngage, Decision Tree has three elements in it, namely:
Root node (root), The ultimate goal or major decision to be taken. Branches (twigs), Various action options. Leaf node (leaf), Possible results for each action. This graph consists of the minimum number of yes/no questions from a question, to assess each probability. This probability value will be a method of making decisions in a structured and systematic way to arrive at the right conclusions.
- Random Forest
In Machine Learning we will often hear about the Random Forest method used to solve problems. The Random Forest method is one of the methods in the Decision Tree.
Random Forest is a combination of each good tree which is then combined into one model. Random Forests depend on a random vector value with the same distribution in all trees where each Decision Tree has a maximum depth.
Therefore, the basic principle of random forest is similar to Decision Tree. Each Decision Tree will produce different outputs. Well, this Random Forest will vote to determine the majority of all Decision Trees. Simply put, Random Forest will output the majority of the results from all Decision Trees.
- K-Nearest Neighbor Classifier (KNN)
K-Nearest Neighbor is one of the Machine Learning methods that serves to make decisions using Supervised Learning where the results of the new input data are classified based on the closest in the value data.
The way the K-Nearest Neighbor (KNN) algorithm works is to classify objects based on the learning data that is closest to the object. Where is the class that appears the most which will later become the class resulting from the classification?
- Hierarchical Clustering
Hierarchical Clustering is a clustering technique with Machine Learning algorithms that form a hierarchy or based on a certain level so that it resembles a tree structure. Thus the grouping process is carried out in stages or stages. Usually, this method is used on data that is not too large and the number of clusters to be formed is unknown.
In principle, Hierarchical Clustering will perform clustering in stages based on the similarity of each data. So that in the end, at the end of the hierarchy, clusters will be formed whose characteristics are different from each other, and objects in the same cluster have similarities to each other.
In the hierarchical method, there are two types of grouping strategies, namely Agglomerative and Divisive. Agglomerative Clustering (merging method) is a hierarchical clustering strategy that starts with each object in a separate cluster and then forms a cluster that gets bigger.
So, the number of initial clusters is equal to the number of objects. Meanwhile, Divisive Clustering (division method) is a hierarchical grouping strategy that starts from all objects grouped into a single cluster and then separated until each object is in a separate cluster.
- Naive Bayes
Naive Bayes is a Machine Learning algorithm for classification with computational efficiency and good accuracy, especially for large dimensions and amounts of data. However, the performance of this algorithm will decrease if the attributes are not related to each other.
Several solutions to solve these problems such as attribute selection, structure extension, or the weighting of each attribute. Some real examples of Naive Bayes Classification are as a spam email marker or not, Classifying the category of a news article, even being used for facial recognition software.
- Support Vector Machine SVM is an algorithm that is generally used for classification and regression. In Machine Learning, SVM is included in the supervised learning model that deals with data analysis and pattern recognition. The basic method of SVM is to take a set of input data and then estimate for each given input from two possible classes of output.
In classification modeling, SVM has a more mature concept and is clearer mathematically than other classification techniques. SVM can also overcome classification and regression problems with linear and non-linear.